Fast and Scalable HPSG Parsing

نویسندگان

  • Takashi Ninomiya
  • Yoshimasa Tsuruoka
  • Yusuke Miyao
  • Kenjiro Taura
  • Jun’ichi Tsujii
چکیده

We investigated the efficacy of beam search parsing and deep parsing techniques in probabilistic HPSG parsing. We first tested the beam thresholding and iterative parsing. Next, we tested three techniques originally developed for deep parsing: quick check, large constituent inhibition, and hybrid parsing with a CFG chunk parser. The quick check, iterative parsing and hybrid parsing greatly contributed to total parsing performance. The accuracy and average parsing time for the Penn treebank were 87.2% and 355 ms. Finally, we tested robustness and scalability of HPSG parsing on the MEDLINE corpus consisting of around 1.4 billion words. The entire corpus was parsed in 9 days with 340 CPUs. RÉSUMÉ. Nous avons étudié l’efficacité de l’analyse de beam search et des techniques de l’analyse profonde dans le probabiliste HPSG analyse. D’abord, nous avons examiné le beam thresholding et l’analyse itérative. Ensuite, nous avons examiné trois techniques développées originalement pour l’analyse profonde: quick check, large constituent inhibition, et l’analyse hybride avec la CFG chunk parser. Le quick check, l’analyse itérative et l’analyse hybride contribuaient considérablement à la performance de l’analyse totale. L’exactitude et le temps d’analyse moyen pour le Penn Treebank étaient 87.2% et 355ms. Finalement, nous avons examiné la robustesse et la extensibilité de HPSG analyse sur le corpus de MEDLINE contenant presque 1.4 milliard de mots. Le corpus entier a été analysé en 9 jours avec 340 CPUs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing Hpsg with Modular Tools for Fast Compiling and Parsing

We describe a modular HPSG implementation , based on a set of tools rather than a single monolithic engine such as ALE. With these tools we can use techniques for much faster compiling and parsing than ALE. We use two-stage grammar compilation with partial execution and a concurrent process implementation of the chart for fast parsing. We compile HPSG lexical rules into Prolog rules used at run...

متن کامل

Extremely Lexicalized Models for Accurate and Fast HPSG Parsing

This paper describes an extremely lexicalized probabilistic model for fast and accurate HPSG parsing. In this model, the probabilities of parse trees are defined with only the probabilities of selecting lexical entries. The proposed model is very simple, and experiments revealed that the implemented parser runs around four times faster than the previous model and that the proposed model has a h...

متن کامل

CuteForce - Deep Deterministic HPSG Parsing

We present a deterministic HPSG parser capable of processing text incrementally with very fast parsing times. Our system demonstrates an efficient data-driven approach that achieves a high level of precision. Through a series of experiments in different configurations, we evaluate our system and compare it to current state-of-the-art within the field, and show that high quality deterministic pa...

متن کامل

Introduction to Data-Oriented Parsing

We present HPSG–DOP, a method for automatically extracting a Stochas-tic Lexicalized Tree Grammar (SLTG) from a HPSG source grammar and a given corpus. 1 Processing of a SLTG is performed by a specialized fast parser. The approach has been tested on a large English grammar and has been shown to achieve additional performance increase compared to parsing with a highly tuned HPSG parser. Our appr...

متن کامل

HPSG Parsing with Shallow Dependency Constraints

We present a novel framework that combines strengths from surface syntactic parsing and deep syntactic parsing to increase deep parsing accuracy, specifically by combining dependency and HPSG parsing. We show that by using surface dependencies to constrain the application of wide-coverage HPSG rules, we can benefit from a number of parsing techniques designed for highaccuracy dependency parsing...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006